基于变压器的体系结构已在各种视觉域(最著名的图像和视频)中变得更具竞争力。虽然先前的工作已经孤立地研究了这些模式,但拥有一个共同的体系结构表明,人们可以训练单个统一模型以多种视觉方式。事先尝试进行统一建模通常使用针对视觉任务量身定制的体系结构,或与单个模态模型相比获得较差的性能。在这项工作中,我们表明可以使用蒙版的自动编码来在图像和视频上训练简单的视觉变压器,而无需任何标记的数据。该单个模型学习了与图像和视频基准上的单模式表示相当或更好的视觉表示,同时使用了更简单的体系结构。特别是,我们的单一预算模型可以进行审核,以在ImageNet上获得86.5%的速度,而在挑战性的事物V2视频基准测试中,可以实现75.3%的范围。此外,可以通过丢弃90%的图像和95%的视频补丁来学习该模型,从而实现非常快速的训练。
translated by 谷歌翻译
我们介绍了一个开源深学习库的Pytorchvideo,为各种视频理解任务提供了丰富的模块化,高效,可重复的组件,包括分类,检测,自我监督学习和低级处理。该库涵盖了一系列视频理解工具,包括复制最先进的性能的多模式数据加载,转换和模型。Pytorchvideo进一步支持硬件加速,从而实现移动设备上的实时推断。图书馆基于Pytorch,可以由任何培训框架使用;例如,pytorchlightning,pyslowfast或优雅的愿景。pytorchvideo在https://pytorchvideo.org/提供
translated by 谷歌翻译
Anomaly detection on time series data is increasingly common across various industrial domains that monitor metrics in order to prevent potential accidents and economic losses. However, a scarcity of labeled data and ambiguous definitions of anomalies can complicate these efforts. Recent unsupervised machine learning methods have made remarkable progress in tackling this problem using either single-timestamp predictions or time series reconstructions. While traditionally considered separately, these methods are not mutually exclusive and can offer complementary perspectives on anomaly detection. This paper first highlights the successes and limitations of prediction-based and reconstruction-based methods with visualized time series signals and anomaly scores. We then propose AER (Auto-encoder with Regression), a joint model that combines a vanilla auto-encoder and an LSTM regressor to incorporate the successes and address the limitations of each method. Our model can produce bi-directional predictions while simultaneously reconstructing the original time series by optimizing a joint objective function. Furthermore, we propose several ways of combining the prediction and reconstruction errors through a series of ablation studies. Finally, we compare the performance of the AER architecture against two prediction-based methods and three reconstruction-based methods on 12 well-known univariate time series datasets from NASA, Yahoo, Numenta, and UCR. The results show that AER has the highest averaged F1 score across all datasets (a 23.5% improvement compared to ARIMA) while retaining a runtime similar to its vanilla auto-encoder and regressor components. Our model is available in Orion, an open-source benchmarking tool for time series anomaly detection.
translated by 谷歌翻译
Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) based on 3D color decomposition. Our method decomposes the appearance of each 3D point into a linear combination of palette-based bases (i.e., 3D segmentations defined by a group of NeRF-type functions) that are shared across the scene. While our palette-based bases are view-independent, we also predict a view-dependent function to capture the color residual (e.g., specular shading). During training, we jointly optimize the basis functions and the color palettes, and we also introduce novel regularizers to encourage the spatial coherence of the decomposition. Our method allows users to efficiently edit the appearance of the 3D scene by modifying the color palettes. We also extend our framework with compressed semantic features for semantic-aware appearance editing. We demonstrate that our technique is superior to baseline methods both quantitatively and qualitatively for appearance editing of complex real-world scenes.
translated by 谷歌翻译
Fine-tuning pre-trained language models (PLMs) achieves impressive performance on a range of downstream tasks, and their sizes have consequently been getting bigger. Since a different copy of the model is required for each task, this paradigm is infeasible for storage-constrained edge devices like mobile phones. In this paper, we propose SPARTAN, a parameter efficient (PE) and computationally fast architecture for edge devices that adds hierarchically organized sparse memory after each Transformer layer. SPARTAN freezes the PLM parameters and fine-tunes only its memory, thus significantly reducing storage costs by re-using the PLM backbone for different tasks. SPARTAN contains two levels of memory, with only a sparse subset of parents being chosen in the first level for each input, and children cells corresponding to those parents being used to compute an output representation. This sparsity combined with other architecture optimizations improves SPARTAN's throughput by over 90% during inference on a Raspberry Pi 4 when compared to PE baselines (adapters) while also outperforming the latter by 0.1 points on the GLUE benchmark. Further, it can be trained 34% faster in a few-shot setting, while performing within 0.9 points of adapters. Qualitative analysis shows that different parent cells in SPARTAN specialize in different topics, thus dividing responsibility efficiently.
translated by 谷歌翻译
数学推理是人类智力的核心能力,在抽象思维和逻辑推理中对机器提出了独特的挑战。最近的大型预训练的语言模型(例如GPT-3)在以文本形式(例如数学单词问题(MWP))编写的数学推理任务上取得了显着的进步。但是,未知模型是否可以处理更复杂的问题,这些问题涉及数学推理,例如表格数据。为了填补空白,我们提出了表格数学单词问题(TABMWP),这是一个包含38,431个开放域级等级问题的新数据集,这些问题需要在文本和表格数据上进行数学推理。 TABMWP中的每个问题都与表格上下文对齐,该上下文作为图像,半结构化文本和结构化表。有两种类型的问题:自由文本和多选择,每个问题都用金解决方案注释以揭示多步推理过程。我们在TABMWP上评估了不同的预训练模型,包括在几次设置中的GPT-3模型。正如先前的研究所表明的那样,由于很少有GPT-3依赖于内在的示例的选择,因此其性能是不稳定的,并且可能会降解为几乎机会。处理TABMWP等复杂问题时,不稳定的问题更为严重。为了减轻这种情况,我们进一步提出了一种新颖的方法,即PresspG,该方法利用策略梯度学习从少量培训数据中选择中文示例,然后为测试示例构造相应的提示。实验结果表明,与随机选择相比,我们的方法在准确性度量上优于最佳基线,并显着降低了预测方差,这验证了其在选择性上下文示例中的有效性。
translated by 谷歌翻译
在回答问题时,人类会利用跨不同模式可用的信息来综合一致,完整的思想链(COT)。在深度学习模型(例如大规模语言模型)的情况下,这个过程通常是黑匣子。最近,科学问题基准已用于诊断AI系统的多跳推理能力和解释性。但是,现有数据集无法为答案提供注释,或仅限于仅文本模式,小尺度和有限的域多样性。为此,我们介绍了科学问题答案(SQA),这是一个新的基准,由〜21k的多模式多种选择问题组成,其中包含各种科学主题和答案的注释,并提供相应的讲座和解释。我们进一步设计语言模型,以学习将讲座和解释作为思想链(COT),以模仿回答SQA问题时的多跳上推理过程。 SQA在语言模型中展示了COT的实用性,因为COT将问题的答案绩效提高了1.20%的GPT-3和3.99%的unifiedqa。我们还探索了模型的上限,以通过喂食输入中的那些来利用解释;我们观察到它将GPT-3的少量性能提高了18.96%。我们的分析进一步表明,与人类类似的语言模型受益于解释,从较少的数据中学习并仅使用40%的数据实现相同的性能。
translated by 谷歌翻译
强化学习(RL)是一种机器学习范式,自主代理人通过与基础环境进行互动来学会做出最佳决策顺序。 RL引导的工作流在解开电子设计自动化问题中所证明的诺言鼓励硬件安全研究人员利用自动RL代理来解决特定领域的问题。从硬件安全性的角度来看,这种自主代理人可以在未知的对抗环境中产生最佳动作。另一方面,综合电路供应链的持续全球化迫使芯片制造成为离岸,不信任的实体,从而增加了对硬件安全性的担忧。此外,未知的对抗环境和增加的设计复杂性使后卫在检测攻击者(又称硬件木马)进行的微妙修改方面具有挑战性。在此简介中,我们概述了RL代理在检测硬件Trojans时的开发,这是最具挑战性的硬件安全问题之一。此外,我们概述了潜在的机会,并提出了应用RL解决硬件安全问题的挑战。
translated by 谷歌翻译
在综合电路制造过程中插入的隐形硬件木马(HTS)可以绕过关键基础架构的安全性。尽管研究人员提出了许多检测HTS的技术,但存在一些局限性,包括:(i)成功率低,(ii)高算法复杂性,以及(iii)大量的测试模式。此外,先前检测技术最相关的缺点源于不正确的评估方法,即,他们假设对手会随机插入HTS。这种不适当的对抗性假设使检测技术能够声称高HT检测准确性,从而导致“错误的安全感”。不幸的是,据我们所知,尽管关于检测在制造过程中插入的HTS的研究多了十年,但仍未进行对HT检测技术进行系统评估的协调努力。在本文中,我们扮演着现实的对手的角色,并通过使用加固学习(RL)开发自动化,可扩展和实用的攻击框架,质疑HT检测技术的功效。损耗逃避了两个HT检测类别的八种检测技术,展示了其不可知论行为。与随机插入的HTS相比,消耗量达到$ 47 \ times $ $ $ 47 \ times $ and $ 211 \ times $的平均攻击成功率。我们通过评估从广泛使用的学术套房到较大的设计(例如开源MIPS和MOR1KX处理器)到AES和AE AE和GPS模块等较大的设计,从而证明了损耗的逃避能力。此外,我们通过两个案例研究(特权升级和杀死开关)对MOR1KX处理器展示了损耗生成的HTS的影响。我们设想我们的工作以及发布的HT基准和模型,促进了更好的HT检测技术的发展。
translated by 谷歌翻译
在集成电路中插入硬件木马(HTS)是一个有害威胁。由于在罕见触发条件下激活HTS,因此使用随机逻辑模拟检测它们是不可行的。在这项工作中,我们设计了一个加固学习(RL)代理,该学习代理绕过指数搜索空间并返回最小的模式集,最有可能检测到HTS。各种基准测试的实验结果证明了我们的RL代理的功效和可扩展性,与国家相比,在维持或改善覆盖范围($ 95.75 \%$)的同时,所需的测试模式数量显着降低($ 169 \ times $)($ 169 \ times $)($ 169 \ times $)($ 169 \ times $)($ 95.75 \%$)。 - 艺术技术。
translated by 谷歌翻译